Skip to content

fix: Check command deadlocks on corrupted DB with page cycles#1202

Open
Abzaek wants to merge 1 commit into
etcd-io:mainfrom
Abzaek:fix/check-command-deadlock-corrupted-db
Open

fix: Check command deadlocks on corrupted DB with page cycles#1202
Abzaek wants to merge 1 commit into
etcd-io:mainfrom
Abzaek:fix/check-command-deadlock-corrupted-db

Conversation

@Abzaek
Copy link
Copy Markdown

@Abzaek Abzaek commented May 17, 2026

Summary

The Check() command deadlocks (infinite recursion) when the database has a corrupted page structure containing cycles — e.g., a branch page pointing to an ancestor.

Root Cause

recursivelyCheckPage() traverses the page tree without any cycle detection. The reachable map exists but is only used after traversal for reporting unreachable/freed pages — it is never checked during traversal. A corrupted page pointing back to an ancestor causes unbounded recursion.

Impact

This affects all etcd users running snapshot status/check on potentially corrupted databases. The etcdctl snapshot status command hangs forever, making recovery impossible.

Fix

Check if the page ID is already in the reachable map before recursing. If it is, emit an error and return instead of recursing infinitely.

Fixes #877

When a database has a corrupt page structure with cycles (e.g., a branch
page referencing an ancestor), recursivelyCheckPage enters infinite
recursion, causing Check() to never complete.

Fix: detect cycles by checking if the page ID is already in the reachable
set before recursing. If already reachable, report the cycle and return.

Fixes etcd-io#877
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Abzaek
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Elbehery
Copy link
Copy Markdown
Member

Thanks for your contribution.

Do you have logs shows reproduction of this bug ?

Also would you mind adding test cases practicing this case ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

Check of corrupted file deadlocks

3 participants